{
 "cells": [
  {
   "cell_type": "markdown",
   "id": "0674ed16",
   "metadata": {},
   "source": [
    "# Querying Factor Models\n",
    "\n",
    "The GS Quant `FactorRiskModel` class allows users to access vendor factor models such as Barra. The `FactorRiskModel` interface supports date-based querying of the factor risk model outputs such as factor returns, covariance matrix and specific risk for assets.\n",
    "\n",
    "In this tutorial, we’ll look at querying available risk models, their coverage universe, and how to access the returns and volatility of factors in the model. We also show how to query factor exposures (z-scores), specific risk and total risk for a given set of assets in the model's universe.\n",
    "\n",
    "The factor returns represent the regression outputs of the model for each day. The definitions of each factor vary depending on the model. More details can be found in the [Marquee Data Catalog](https://marquee.gs.com/s/discover/data-services/catalog?query=factor+risk+model)."
   ]
  },
  {
   "cell_type": "markdown",
   "id": "c995452e",
   "metadata": {},
   "source": [
    "## Step 1: Authenticate and Initialize Your Session\n",
    "\n",
    "First you will import the necessary modules and add your client id and client secret."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "id": "ae1681e6",
   "metadata": {},
   "outputs": [],
   "source": [
    "import datetime as dt\n",
    "import warnings\n",
    "import matplotlib.pyplot as plt\n",
    "import matplotlib.ticker as mtick\n",
    "import pandas as pd\n",
    "import seaborn as sns\n",
    "\n",
    "from gs_quant.timeseries import beta\n",
    "from gs_quant.session import GsSession, Environment\n",
    "from gs_quant.models.risk_model import FactorRiskModel\n",
    "from gs_quant.models.risk_model import Measure, RiskModelUniverseIdentifierRequest as Identifier, DataAssetsRequest\n",
    "\n",
    "client = None\n",
    "secret = None\n",
    "\n",
    "## External users must fill in their client ID and secret below and comment out the lines below\n",
    "\n",
    "# client = 'ENTER CLIENT ID'\n",
    "# secret = 'ENTER CLIENT SECRET'\n",
    "\n",
    "GsSession.use(Environment.PROD, client_id=client, client_secret=secret)\n",
    "warnings.filterwarnings(\"ignore\", category=RuntimeWarning)\n",
    "\n",
    "print('GS Session initialized.')"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "floral-apollo",
   "metadata": {},
   "source": [
    "## Step 2: Pull Risk Model Coverage\n",
    "\n",
    "Popular third party risk models that have been onboarded onto Marquee for programmatic access are below.\n",
    "\n",
    "| Risk Model Name                     | Risk Model Id      | Description|\n",
    "|-------------------------------------|--------------------|-------------\n",
    "| Barra USMED Short (BARRA_USMEDS)    | BARRA_USMEDS       |Barra (MSCI) US Total Market Equity Model for medium-term investors. Includes all styles from the long-term model plus additional factors for investment horizons between 1 month and 1 year (responsive variant).|\n",
    "| Barra USSLOW Long (BARRA_USSLOWL)   | BARRA_USSLOWL      |Barra (MSCI) US Total Market Equity Model for long-term investors. Designed with a focus on portfolio construction and reporting for long investment horizons (stable variant).|\n",
    "| Barra GEMLT Long (BARRA_GEMLTL)     | BARRA_GEMLTL       |Barra (MSCI) Global Total Market Equity Model for long-term investors. Designed with a focus on portfolio construction and reporting for global equity investors (stable variant).|\n",
    "| Barra US Fast (BARRA_USFAST)        | BARRA_USFAST       |Barra (MSCI) US Equity Trading Model for short-term investors. Includes all styles from the medium-term model plus additional factors for shorter investment horizons.|\n",
    "| Wolfe Developed Markets All-Cap v1  | WOLFE_QES_DM_AC_1  | Wolfe's Developed Markets All-Cap model is intended for global portfolios with an emphasis on Developed Markets. The model combines next generation factors like short interest and interest rate sensitivity with conventional factors like value and growth.|\n",
    "| Wolfe US TMT v2                     | WOLFE_QES_US_TMT_2 | Wolfe's US TMT model is intended for sector portfolios. The model uses sector-specific factors in a TMT estimation universe to explain more systematic risk and return relative to a broad market model.|\n",
    "| Wolfe Europe All-Cap v2.1           | WOLFE_QES_EU_AC_21 | Wolfe's Europe All-Cap model is intended for European portfolios with a focus on the developed markets. The model combines next generation factors like short interest and interest rate sensitivity with conventional factors like value and growth.|\n",
    "| Wolfe US Healthcare v2              | WOLFE_QES_US_HC_2  | Wolfe's US Healthcare model is intended for sector portfolios. The model uses sector-specific factors in a Healthcare estimation universe to explain more systematic risk and return relative to a broad market model.|\n",
    "\n",
    "After selecting a risk model, we can create an instance of the risk model to pull information on the model coverage such as the available dates, asset coverage universe, available factors and model description. The `RiskModelCoverage` enum of the model indicates whether the scope of the universe is Global, Region or Country and the `Term` enum refers to the horizon of the model."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "id": "social-attendance",
   "metadata": {
    "pycharm": {
     "name": "#%%\n"
    }
   },
   "outputs": [],
   "source": [
    "model_id = 'RISK MODEL ID'\n",
    "factor_model = FactorRiskModel.get(model_id)\n",
    "\n",
    "# Check available history for a factor model to decide start and end dates\n",
    "available_days = factor_model.get_dates()\n",
    "\n",
    "print(f'Data available for {model_id} from {available_days[0]} to {available_days[-1]}')\n",
    "print(\n",
    "    f'{model_id}:\\n - Name: {factor_model.name}\\n - Description: {factor_model.description}\\n - Coverage: {factor_model.coverage.value}\\n - Horizon: {factor_model.term.value}'\n",
    ")\n",
    "print(f'For all info https://marquee.gs.com/v1/risk/models/{model_id}')"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "median-individual",
   "metadata": {},
   "source": [
    "## Step 3: Query Factor Data\n",
    "\n",
    "The following parameters are required for querying factor data:\n",
    "\n",
    "* `start_date` - date or datetime that is a business day\n",
    "* `end_date` - date or datetime that is a business day. If an end date is not specified, it will default to the last available date"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "documentary-trail",
   "metadata": {},
   "source": [
    "##### Get Available Factors\n",
    "For each model, we can retrieve a list of factors available. Each factor has a `name`, `id`, `type` and `factorCategory`.\n",
    "\n",
    "A factor's `factorCategory` can be one of the following:\n",
    "* Style - balance sheet and market metrics\n",
    "* Industry - an asset's line of business (i.e. Barra uses GICS classification)\n",
    "* Country - reference an asset’s exchange country location\n"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "id": "improving-recycling",
   "metadata": {
    "pycharm": {
     "name": "#%%\n"
    }
   },
   "outputs": [],
   "source": [
    "model_id = 'RISK MODEL ID'\n",
    "factor_model = FactorRiskModel.get(model_id)\n",
    "\n",
    "start_date = dt.date(2021, 1, 4)  # Set your start date, e.g: dt.date(2021, 1, 4)\n",
    "end_date = dt.date(2021, 1, 26)  # Set your end date, e.g: dt.date(2021, 1, 26)\n",
    "\n",
    "available_factors = factor_model.get_factor_data(start_date, end_date).set_index('identifier')\n",
    "available_factors.sort_values(by=['factorCategory'])"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "cosmetic-contrary",
   "metadata": {},
   "source": [
    "##### Get All Factor Returns\n",
    "\n",
    "To query factor returns, we can either use `get_factor_returns_by_name` to retrieve the returns with names or `get_factor_returns_by_id` to get the returns with factor ids. We can leverage [the timeseries package](https://developer.gs.com/docs/gsquant/data/data-analytics/timeseries/) to transform and visualize the results."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "id": "equal-thailand",
   "metadata": {
    "pycharm": {
     "name": "#%%\n"
    }
   },
   "outputs": [],
   "source": [
    "model_id = 'RISK MODEL ID'\n",
    "factor_model = FactorRiskModel.get(model_id)\n",
    "\n",
    "date = dt.date(2020, 1, 4)  # Set your date, e.g. dt.date(2020, 1, 4)\n",
    "\n",
    "factor_returns = factor_model.get_factor_returns_by_name(date)\n",
    "fig, ax = plt.subplots(nrows=1, ncols=2, figsize=(15, 5))\n",
    "\n",
    "factor_returns[['Growth', 'Momentum', 'Size']].cumsum().plot(\n",
    "    title='Factor Performance over Time for Risk Model', ax=ax[0]\n",
    ")\n",
    "factor_beta = beta(factor_returns['Growth'], factor_returns['Momentum'], 63, prices=False)\n",
    "factor_beta.plot(title='3m Rolling Beta of Growth to Momentum', ax=ax[1])\n",
    "fig.autofmt_xdate()\n",
    "plt.show()"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "removable-boards",
   "metadata": {},
   "source": [
    "##### Covariance Matrix\n",
    "\n",
    "The covariance matrix represents an N-factor by N-factor matrix with the diagonal representing the variance of each factor for each day. The covariance matrix is in daily variance units."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "id": "bibliographic-medicare",
   "metadata": {
    "pycharm": {
     "name": "#%%\n"
    }
   },
   "outputs": [],
   "source": [
    "model_id = 'RISK MODEL ID'\n",
    "factor_model = FactorRiskModel.get(model_id)\n",
    "\n",
    "start_date = dt.date(2021, 1, 4)  # Set your start date, e.g: dt.date(2021, 1, 4)\n",
    "end_date = dt.date(2021, 1, 26)  # Set your end date, e.g: dt.date(2021, 1, 26)\n",
    "\n",
    "cov_matrix = factor_model.get_covariance_matrix(start_date, end_date) * 100\n",
    "\n",
    "# set display options below--set max_rows and max_columns to None to return full dataframe\n",
    "max_rows = 10\n",
    "max_columns = 7\n",
    "pd.set_option('display.max_rows', max_rows)\n",
    "pd.set_option('display.max_columns', max_columns)\n",
    "\n",
    "# get the last available matrix\n",
    "round(cov_matrix.loc['DATE: e.g. 2021-02-26'], 6)"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "operating-hearing",
   "metadata": {},
   "source": [
    "##### Factor Correlation and Volatility\n",
    "\n",
    "The `Factor` Class allows for quick analytics for a specified factor to easily support comparing one factor across different models or to another factor.\n",
    "\n",
    "The factor volatility and correlation functions use the covariance matrix for calculations:\n",
    "* Volatility is the square root of the diagonal\n",
    "* Correlation is derived from the covariance matrix by dividing the cov(x,y) by the vol(x) * vol(y)"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "id": "outdoor-security",
   "metadata": {
    "pycharm": {
     "name": "#%%\n"
    }
   },
   "outputs": [],
   "source": [
    "model_id = 'RISK MODEL ID'\n",
    "factor_model = FactorRiskModel.get(model_id)\n",
    "\n",
    "momentum = factor_model.get_factor('Momentum')\n",
    "growth = factor_model.get_factor('Growth')\n",
    "\n",
    "start_date = dt.date(2021, 1, 4)  # Set your start date, e.g: dt.date(2021, 1, 4)\n",
    "end_date = dt.date(2021, 1, 26)  # Set your end date, e.g: dt.date(2021, 1, 26)\n",
    "\n",
    "vol = momentum.volatility(start_date, end_date)\n",
    "corr = momentum.correlation(growth, start_date, end_date)\n",
    "\n",
    "# plot\n",
    "fig, ax1 = plt.subplots()\n",
    "ax2 = ax1.twinx()\n",
    "ax1.plot(vol.index, corr * 100, 'g-', label='Momentum vs Growth Correlation (LHS)')\n",
    "ax1.yaxis.set_major_formatter(mtick.PercentFormatter())\n",
    "ax2.yaxis.set_major_formatter(mtick.PercentFormatter())\n",
    "ax2.plot(vol.index, vol * 1e4, 'b-', label='Momentum Volatility (RHS)')\n",
    "plt.xticks(vol.index.values[::30])\n",
    "fig.legend(loc=\"lower right\", bbox_to_anchor=(0.75, -0.10))\n",
    "fig.autofmt_xdate()\n",
    "plt.title('Momentum vs Growth Historical Factor Analysis')\n",
    "plt.show()"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "21845fe4",
   "metadata": {},
   "source": [
    "#### Quick Hint:\n",
    "\n",
    "You can also get factor volatility for many factors all at once!"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "id": "10df2a94",
   "metadata": {},
   "outputs": [],
   "source": [
    "factor_model.get_factor_volatility(start_date, end_date)"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "identified-terminology",
   "metadata": {},
   "source": [
    "## Step 4: Query Asset Data\n",
    "\n",
    "The factor risk represents the beta coefficient that can be attributed to the model, whereas the specific (residual) risk refers to the error term that is not explained by the model.\n",
    "\n",
    "| Measure                    | Definition    |\n",
    "|----------------------------|---------------|\n",
    "| `Specific Risk`            | Annualized idiosyncratic risk or error term which is not attributable to factors in percent units |\n",
    "| `Total Risk`               | Annualized risk which is the sum of specific and factor risk in percent units |\n",
    "| `Historical Beta`          | The covariance of the residual returns relative to the model's estimation universe or benchmark (i.e results of a one factor model)  |\n",
    "| `Residual Variance`        | Daily error variance that is not explained by the model which is equal to $$\\frac{({\\frac{\\text{Specific Risk}}{100}})^2}{252}$$ |\n",
    "| `Universe Factor Exposure` | Z-score for each factor relative to the model's estimation universe |\n",
    "| `Predicted Beta`           | The beta coefficient derived from predictive models; it estimates the asset’s expected sensitivity to market or factor fluctuations, reflecting forecast risk exposure |\n",
    "| `Daily Return`             | The daily percentage change in an asset’s price, representing short-term performance fluctuations |\n",
    "| `R Squared`                | The proportion of asset return variance explained by the model’s risk factors, expressed as a percentage |\n",
    "| `Factor Return`            | The portion of the asset return that is attributed to the risk factor(s) |\n",
    "| `Dividend Yield`           | The annual dividend income expressed as a percentage of the asset's current price |\n",
    "\n",
    "We can retrieve an asset universe on a given date by passing in an empty list and a `RiskModelUniverseIdentifierRequest` to the `DataAssetsRequest`."
   ]
  },
  {
   "cell_type": "markdown",
   "id": "spatial-problem",
   "metadata": {},
   "source": [
    "##### Get Risk Model Universe Coverage\n",
    "\n",
    "Note: \n",
    "You can query asset-related risk model data with the folloing identifiers:\n",
    "|Identifier |\n",
    "|-----------|\n",
    "|BBID |\n",
    "|BCID |\n",
    "|SEDOL |\n",
    "| CUSIP|\n",
    "|ISIN |\n",
    "|GSID |, bcid, sedol, cusip, isin, gsid"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "id": "short-kennedy",
   "metadata": {
    "pycharm": {
     "name": "#%%\n"
    }
   },
   "outputs": [],
   "source": [
    "model_id = 'RISK MODEL ID'\n",
    "factor_model = FactorRiskModel.get(model_id)\n",
    "\n",
    "date = dt.date(2021, 1, 4)  # Set your date, e.g: dt.date(2021, 1, 4)\n",
    "\n",
    "asset_universe_for_request = DataAssetsRequest(Identifier.gsid, [])  # entire universe\n",
    "universe_on_date = factor_model.get_asset_universe(date, assets=asset_universe_for_request)\n",
    "\n",
    "# set display options below--set max_rows to None to return full list of identifiers\n",
    "max_rows = 10\n",
    "pd.set_option('display.max_rows', max_rows)\n",
    "universe_on_date"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "relevant-brooklyn",
   "metadata": {},
   "source": [
    "##### Query Aggregated Risk\n",
    "\n",
    "For asset data, we can query for a specific measure or pull data for a list of measures over a range of dates."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "id": "copyrighted-elements",
   "metadata": {
    "pycharm": {
     "name": "#%%\n"
    }
   },
   "outputs": [],
   "source": [
    "model_id = 'RISK MODEL ID'\n",
    "factor_model = FactorRiskModel.get(model_id)\n",
    "\n",
    "asset_bbid = 'ASSET BBID (e.g. \"AAPL UW\")'\n",
    "\n",
    "start_date = dt.date(2021, 1, 4)  # Set your start date, e.g: dt.date(2021, 1, 4)\n",
    "end_date = dt.date(2021, 1, 26)  # Set your end date, e.g: dt.date(2021, 1, 26)\n",
    "\n",
    "# get risk\n",
    "universe_for_request = DataAssetsRequest(Identifier.bbid, [asset_bbid])\n",
    "specific_risk = factor_model.get_specific_risk(start_date, end_date, universe_for_request)\n",
    "total_risk = factor_model.get_total_risk(start_date, end_date, universe_for_request)\n",
    "factor_risk = total_risk - specific_risk\n",
    "\n",
    "plt.stackplot(\n",
    "    total_risk.index, specific_risk[asset_bbid], factor_risk[asset_bbid], labels=['Specific Risk', 'Factor Risk']\n",
    ")\n",
    "plt.title(f'{asset_bbid} Risk')\n",
    "plt.xticks(total_risk.index.values[::50])\n",
    "plt.legend(loc='upper right')\n",
    "plt.gcf().autofmt_xdate()\n",
    "plt.show()"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "frequent-lithuania",
   "metadata": {},
   "source": [
    "##### Query Factor Exposures (z-scores)\n",
    "\n",
    "When querying the asset factor exposures, set the `limit_factor` to True to receive only non zero exposures."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "id": "micro-processor",
   "metadata": {
    "pycharm": {
     "name": "#%%\n"
    }
   },
   "outputs": [],
   "source": [
    "model_id = 'RISK MODEL ID'\n",
    "factor_model = FactorRiskModel.get(model_id)\n",
    "\n",
    "asset_bbid = 'ASSET BBID (e.g. \"AAPL UW\")'\n",
    "universe_for_request = DataAssetsRequest(Identifier.bbid, [asset_bbid])\n",
    "\n",
    "start_date = dt.date(2021, 1, 4)  # Set your start date, e.g: dt.date(2021, 1, 4)\n",
    "end_date = dt.date(2021, 1, 26)  # Set your end date, e.g: dt.date(2021, 1, 26)\n",
    "\n",
    "factor_exposures = factor_model.get_universe_factor_exposure(start_date, end_date, universe_for_request)\n",
    "\n",
    "available_factors = factor_model.get_factor_data(dt.date(2020, 1, 4)).set_index('identifier')\n",
    "available_factors.sort_values(by=['factorCategory']).tail()\n",
    "\n",
    "factor_exposures.columns = [available_factors.loc[x]['name'] for x in factor_exposures.columns]\n",
    "\n",
    "sns.boxplot(data=factor_exposures[['Beta', 'Momentum', 'Growth', 'Profitability']])\n",
    "plt.title(f'Distribution of {asset_bbid} Factor Exposures since 1/4/20')\n",
    "plt.show()"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "greatest-graham",
   "metadata": {},
   "source": [
    "##### Query Multiple Asset Measures"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "id": "subtle-young",
   "metadata": {
    "pycharm": {
     "name": "#%%\n"
    }
   },
   "outputs": [],
   "source": [
    "model_id = 'RISK MODEL ID'\n",
    "factor_model = FactorRiskModel.get(model_id)\n",
    "\n",
    "# get multiple measures across a date range for a universe specified\n",
    "start_date = dt.date(2021, 1, 4)  # Set your start date, e.g: dt.date(2021, 1, 4)\n",
    "end_date = dt.date(2021, 1, 26)  # Set your end date, e.g: dt.date(2021, 1, 26)\n",
    "\n",
    "asset_bbid = 'ASSET BBID (e.g. \"AAPL UW\")'\n",
    "universe_for_request = DataAssetsRequest(Identifier.bbid, [asset_bbid])\n",
    "\n",
    "data_measures = [\n",
    "    Measure.Universe_Factor_Exposure,\n",
    "    Measure.Asset_Universe,\n",
    "    Measure.Historical_Beta,\n",
    "    Measure.Specific_Risk,\n",
    "    Measure.Factor_Name,\n",
    "    Measure.Factor_Id,\n",
    "]\n",
    "asset_risk_data = factor_model.get_data(data_measures, start_date, end_date, universe_for_request, limit_factors=True)\n",
    "\n",
    "for i in range(len(asset_risk_data.get('results'))):\n",
    "    date = asset_risk_data.get('results')[i].get('date')\n",
    "    universe = asset_risk_data.get('results')[i].get('assetData').get('universe')\n",
    "    factor_exposure = asset_risk_data.get('results')[i].get('assetData').get('factorExposure')\n",
    "    factor_id_to_name = asset_risk_data.get('results')[i].get('factorData')\n",
    "    # Create a mapping of factorId to factorName\n",
    "    factor_id_to_name_map = {item['factorId']: item['factorName'] for item in factor_id_to_name}\n",
    "    # Replace factor IDs with factor names in factor_exposure\n",
    "    factor_exposure_with_names = [\n",
    "        {factor_id_to_name_map.get(fid, fid): value for fid, value in exposure.items()} for exposure in factor_exposure\n",
    "    ]\n",
    "    historical_beta = asset_risk_data.get('results')[i].get('assetData').get('historicalBeta')\n",
    "    specific_risk = asset_risk_data.get('results')[i].get('assetData').get('specificRisk')\n",
    "    print(f'date: {date}')\n",
    "    print(f'universe: {universe}')\n",
    "    print(f'factor id to factor exposure: {factor_exposure}')\n",
    "    print(f'factor name to factor exposure: {factor_exposure_with_names}')\n",
    "    print(f'historical beta: {historical_beta}')\n",
    "    print(f'specific risk: {specific_risk}')\n",
    "    print('\\n')"
   ]
  }
 ],
 "metadata": {
  "kernelspec": {
   "display_name": "Python 3",
   "language": "python",
   "name": "python3"
  },
  "language_info": {
   "codemirror_mode": {
    "name": "ipython",
    "version": 3
   },
   "file_extension": ".py",
   "mimetype": "text/x-python",
   "name": "python",
   "nbconvert_exporter": "python",
   "pygments_lexer": "ipython3",
   "version": "3.9.13"
  }
 },
 "nbformat": 4,
 "nbformat_minor": 5
}
